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Abstract One of the main obstacles in extracting the 
Cosmic Microwave Background (CMB) signal from ob- 
servations in the mm-submm range is the foreground 
contamination by emission from galactic components: 
mainly synchrotron, free-free and thermal dust emis- 
sion. Due to the statistical nature of the intrinsic CMB 
signal it is essential to minimize the systematic errors 
in the CMB temperature determinations. 

Following the available knowledge of the spectral be- 
havior of the galactic foregrounds simple, power law-like 
spectra have been assumed. The feasibility of using a 
simple neural network for extracting the CMB temper- 
ature signal from the combined CMB and foreground 
signals has been investigated. As a specific example, we 
have analysed simulated data, like that expected from 
the ESA Planck Surveyor mission. A simple multilayer 
perceptron neural network with 2 hidden layers can pro- 
vide temperature estimates, over more than 80 percent 
of the sky, that are to a high degree uncorrelated with 
the foreground signals. A single network will be able to 
cover the dynamic range of the Planck noise level over 
the entire sky. 
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1 Introduction 

The Cosmic Microwave Background (CMB) is by far 
the most important data set available for cosmological 
investigations. The angular power spectrum of its tem- 
perature and polarization anisotropics contain unique 
information about the basic cosmological parameters. 
Since the discovery of the CMB by Penzias & Wil- 
son (1965), tremendous efforts on the instrumental side 
have been made to improve sensitivity and angular res- 
olution. The latest major experiment is the success- 
ful, ongoing, Wilkinson Microwave Anisotropy Probe 
(WMAP Bennett et al. 2003a) and the next step for- 
ward in CMB measurements will no doubt be the ESA 
Planck mission. 

Due to the statistical nature of the CMB signal it 
is essential to subtract carefully all signals from non- 
cosmological sources, before the data can be used in 
a cosmological context. A clear demonstration of this 
problem has been given by Tegmark et al. (2000) . Their 
model included frequency dependence, angular scale de- 
pendence for each physical component, and variations 
in the frequency dependence across the sky. From these 
simulations they calculated angular power spectra and 
extracted the basic cosmological parameters. For the 
foregrounds they assumed power laws, and made 3 dif- 
ferent sets of assumptions about the power law parame- 
ters: Optimistic (O), Middle-of-The-Road (MID), Pes- 
simistic (PESS). With observational errors like those 
for Planck, they found that the accuracy of most cos- 
mological parameters is degraded by a factor of about 

2 for the MID model, and by a factor of about 5 for 
the PESS model. Therefore, in order to exploit fully 
the scientific capability of an experiment like Planck a 
procedure for reliable removal of all non-cosmological 
signals is mandatory. 

On angular scales larger than about 30' the non- 
CMB signal is dominated by diffuse emission from 
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our own galaxy (Dc Zotti et al. 1999). Synchrotron 
(Haslam et al. 1982) and free-free (HafFner, Reynolds & 
Tufte 1999; Finkbciner 2003) emissions dominate below 
about 60-80 GHz, while at higher frequencies thermal 
dust emission takes over. On smaller angular scales 
the foreground fluctuations are dominated by several 
populations of extragalactic sources, each with different 
spectral behaviour: radio sources, dusty galaxies and 
the Sunyaev-Zeldovich effect from clusters of galaxies. 

The CMB radiation has a virtually perfect black 
body spectrum (Mather 1999), while all known non- 
cosmological signals have a clear non-thermal frequency 
dependence. Therefore, it should be possible to split 
the observed microwave signal into its different com- 
ponents. This depends critically, of course, on the ob- 
servational accuracy and the number of frequencies ob- 
served. 

A lot of different methods for component separation 
of CMB signals have been investigated, including: 

• Maxim um Entropy Me thod (MEM): Hobson et a l. 
|l998l ). Stolyaro v et al . |2002f l. Barreiro et al. |2004h . 
Stolyarov et al. (|2005l ) 

• Internal Li near Co mbination method (ILC): e.g. Ben- 
nett et al. |2003bh 

• Wiener Filtering: e.g. Tegmark & Efstathiou 1 1996h 

• Independent Co mpone nt Analysis (ICA) method: 
e.g. Maino et al. |2002l) 



It is beyond the scope of this paper to give a review 
of all relevant component separation methods, but an 
excellent review can be found in Delabrouille & Cardoso 



( 20071) . 

The statistical properties of the CMB temperature 
fluctuations on a sphere are normally expressed as a 
sum of spherical harmonics: 
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where Yira{0,4') are the spherical harmonic functions. 
The simplest versions of the inflation paradigm predict 
that the CMB temperature fluctuations will be isotrop- 
ically, randomly, Gaussianly distributed on a sphere. 
With this assumption, the statistical properties are 
then completely specified by the second order statis- 
tics, the angular power spectrum C;, 
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Therefore, a lot of effort has been put into construct- 
ing algorithms for foreground removal from CMB data, 
which assure minimum systematic residuals in the de- 
rived angular power spectrum. But. of course, an opti- 



mal method will also assure minimal systematic resid- 
uals in the CMB map itself. 

As demonstrated by Chiang & Naselsky (2007), im- 
portant information can be derived from the phases of 
the spherical harmonic coefficients. They find clear ev- 
idence for systematics in the distribution of phases at 
high latitudes in the WMAP 3y ILC map, and inter- 
pret them as due to residual foreground signals. For the 
Planck mission, one of the key scientific issues will be 
the search for primordial non-gaussianity in the CMB 
map. If clear evidence is not found, then tight upper 
limits on non-gaussianity will be important, emphasiz- 
ing the need not only power for spectra with no sys- 
tematic errors, but also maps. 

The main purpose of this paper is to investigate 
the feasibility of removing galactic foreground emission 
from CMB data by means of a neural network. As 
an example we will specifically examine simulated tem- 
perature data as expected from the ESA Planck CMB 
mission. See ESA-SCI(2005)1 for a detailed technical 
description of the payload and a discussion of the ex- 
pected scientific capability. Planck will provide maps of 
the microwave sky with unprecedented sensitivity and 
angular resolution. The 2 detector systems have been 
designed to cover a very broad frequency range in order 
to facilitate the removal of foregrounds. Therefore, it is 
important to analyse how well the different foreground 
signals can be removed by using only Planck data, and 
so-called 'blind' extraction methods. 

For Planck, there is still uncertainty about how this 
component separation shall be performed in practice. 
Ideally, for each pixel on the sky the component sepa- 
ration procedure should provide an estimate of the tem- 
perature of the CMB, with an accidental error derived 
from the observational accuracy and with negligible sys- 
tematic errors (i.e. an error uncorrelated with any of the 
parameters of the different foreground signals). 

Several methods have been developed to estimate the 
CMB temperature without any assumptions about pri- 
ors. Totally 'blind' methods like the 'Independent Com- 
ponent Analysis' method used by FastICA (Maino et 
al. 2002) and SMICA (Delabrouille et al. 2003; Moud- 
den et al. 2005) can, in principle, estimate both the 
CMB map and the spatial and frequency dependence 
of the foreground sources, only relying on the CMB 
data itself. Similarly, in the 'Internal Linear Combi- 
nation' (ILC) method used by the WMAP team (Ben- 
nett et al. 2003b; Hinshaw et al. 2007), no assumptions 
are made a priori about the foregrounds. Another ap- 
proach was investigated by Brandt et al. (1994). They 
parameterized the spectra of the different foregrounds 
and then fitted them, sky pixel by sky pixel, with 
a non-linear least squares Levenberg-Marquardt algo- 
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rithm (Lcvcnbcrg 1944; Marquardt 1963). This ap- 
proach has been further developed by Linden- V0rnle 
& N0rgaard-Niclsen (1998) and Eriksen ct al. (2006). 

In this paper we will elaborate on the ILC method 
by incorporating the non-linear features of neural net- 
works. To set up these neural networks we parameterize 
the spectra of the different foregrounds in line with the 
Brandt et al. (1994) approach. 

This paper is organized in the following way: In Sec- 
tion 2, we outline the available information about spec- 
tral behaviour of the galactic foregrounds. We present 
the details of the modeling used in Sections 3 and 4. 
In Section 5, we briefly discuss the results found previ- 
ously with the ILC method. Since neural networks are 
not commonly used in astrophysical literature, a brief 
introduction to the neural network concept is given in 
Section 6. We present the neural network used in Sec- 
tion 7 . In Section 8, we discuss our results in relation 
to similar previous investigations. 



Brighter regions away from the galactic plane have 
typical values of a at 100 and 800 MHz of —0.55 and 
—0.8, respectively, Lawson at al. (|l987l ). At higher 
frequencies, the spectral index is expected to steepen 
by ab out 0. 5 due to electron energy losses (Platania 
et al. ( 19981 )). Banday et al. ( 2003 ) derived a mean 
spectral ind ex bet ween 408 MHz and 19.2 GHz from the 
Cottenham (|l987r ) survey, and between 31.5, 53 and 90 
GHz from COBE-DMR data. The steep spectral index 
of — 1.1, for galactic latitude \ b\ > 15 °, is consistent 
with expectations. Bennett et al. ( 2003bl ) claim that the 
steepe ning o ccurs near the K-band (23 GHz). Eriksen 
et al. (|2006l ) conclude that for |&| > 15° the spectral 
index above 10 GHz is likely between —0.7 and — 1.2 . 
As emphasized by Bennet et al.(1992) the synchrotron 
spectral index is dependent on the interstellar magnetic 
fields and is expected to steepen for higher frequencies. 
For example, a change of -Be// from 0.1 to 5 /iG gives a 
change in spectral index of 0.1 in the range 53-90 GHz. 



2 Foreground modeling 

In the following, we identify the different non-CMB 
components to be included in our analysis, and give 
the details of the spectral models used. 

2.1 Synchrotron emission 

The galactic synchrotron emission originates from rel- 
ativistic electrons spiraling in magnetic fields. In the 
galactic plane the magnetic field is ordered on large 
scales, with the field parallel to the spiral arms. Super- 
imposed are small scale structures showing variations 
between the arm and inter-arm regions and with the 
local gas phase. The two components seem to have 
about the same magnitude. At high latitudes, there 
are contributions from the galactic halo and specific 
structures e.g. the North Polar Spur. Variations in the 
spectral index come from variations in the electron en- 
ergy spectrum, depending on the age and the origin 
of the electrons (e.g. supernova, diffuse shocks in the 
interstellar medium). The synchrotron emission is tra- 
ditionally modeled by a single power law 



(3) 



Ssiiy) = As [ — 



where As is the synchrotron flux density at some ref- 
erence frequency i^o and as is the synchrotron spectral 
index. It is most likely that the spectral index will de- 
pend both on the frequency and the position on the sky, 
implying that at least two free parameters are needed 
to describe the synchrotron emission in a specific direc- 
tion. 



2.2 Free- free emission 

The Coulomb interaction between the free electrons and 
ions in the Milky Way results in thermal bremsstrahl- 
ung radiation, traditionally called free-free emission. 
Free-free emission is difficult to observe and simulate 
because at high latitudes it is not dominant at any fre- 
quency. 

Fro m the formulations given by Dickenson et al. 
(|2003l) . the free-free brightness temperature can be de- 
scribed with the following expression: 

T//,f, oc v~'^T-^-^{\n[Q.Qm^v-^] + 1.51nTe), (4) 

where Tg is the electron temperature. 

Shaver et al. |l983h derived the electron temperature 
of HII regions at the galactocentric radius of the Sun as 
7200 K ± 1200 K by means of radio recombination lines. 
Similar results have been found for a larger s ample con- 
taining many weaker sources: Paladini et al. (j2005l ). At 
high latitude, the ionized gas will typically be within 
about 1 kpc of the Sun, and the electron temperature 
is s hould be in the range 7000-8000 K, Dickenson et 
al. 1I2OO3I) . It is possible that the diffuse emission at a 
given galactocentric distance may differ from the emis- 
sion of high density HII regions in th e galactic plane, 
as emphasized by Eriksen et al. ( 2006[ l. 

From these constraints, it is expected that the ef- 
fective spectral index in the frequency range relevant 
for Planck is a// = -0.14. Between 10 and 100 GHz 
a values between —0.1 and —0.2 seem reasonable, and 
a steepening t o —0.3 at hundreds of GHz is foreseen 
(Eriksen et al. (|2006f )). 
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Free-free emission is normally modeled by a simple 
power law 



1^0 



(5) 



Both Eriksen et al. (|2006[ l and t he WM AP team (Ben- 
nett et al. 2003, Hinshaw et al. (|2007t )) assume a con- 
stant free-free spectral index. Future high quality radio 
observations will probably show that this assumption is 
not valid, but it is still reasonable to expect that the 
spectral differences within the free-free galactic compo- 
nent will be smaller than for other foregrounds. 

2.3 Thermal dust emission 

Dust grains in the interstellar medium large enough to 
be in thermal equilibrium with the stellar radiation field 
will emit in the frequency range interesting for CMB 
research. From the all-sky observation of IRAS and 
COBE DIRBE it is known that this thermal emission 
peaks around 140/im. 

Widely accepted models of the dust emission arc 
extrapolations of the high-frequency IRAS, COBE 
DIRBE and FIRAS observations to CMB frequencies 
by Finkbeiner et al. (1999, hereafter FDS). These mod- 
els use combinations of modified blackbody functions 
with different dust temperatures. These combined 
functions can approximate the integrated contributions 
from multiple components of dust, i.e. differences in 
grain properties, size, chemical composition and equi- 
librium temperature. The best-fit model of FDS model 
8 assumes two main components with a tot al of 6 pa- 
rameters. As emphasized by Eriksen et al. ( 20061 ) nei- 
ther Planck nor any planned future CMB experiments 
have sufficient frequency resolution and sensitivity to 
constraint so many parameters for dust emission by 
themselves. 

I n the ir component separation analysis, Eriksen et 
al. ([20061) calculate the dust emission from FDS model 
8, but extract the dust component by using a much sim- 
pler function (FDS model 3), namely a power law com- 
bined with a slowly decreasing function of frequency of 
the order unity over the frequencies of interest (see Eq. 
6). 

2.4 Anomalous dust emission 



discussion so far. They conclude that the topic of the 
anomalous dust emission remains unsettled, probably 
until high quality diffuse measurements in the 5-15 GHz 
range are available for a significant part of the sky. Due 
to the uncertainty of its spectral behaviour and its de- 
tailed relation to the galactic dust component, we will 
not incorporate this component into the present inves- 
tigation. 



3 Parameter ranges chosen for the neural 
network 



As emphasized by Eriksen et al. (j2006f ). neither Planck 
nor any of the planned future CMB missions will have 
sufficient power to constrain very detailed models of 
the galactic components, so simplified models will be 
needed. The situation may be improved by introducing 
additional external data to the fit. Since this paper 
will investigate the Planck data alone, this possibility 
is outside the scope of the paper. 

Concerning the CMB temperature range, by exam- 
ining the WMAP ILC map it follows that a range of 
±750 //K covers the dynamic range of the CMB ther- 
modynamic temperatures well. 

checkwmapcomp Wed Sep 26 13:07:13 2007 
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Fig. 1 Sky fraction covered by the different galactic com- 
ponents as a function of the WMAP W-band (93 GHz) flux. 
From top to bottom the three lines give the coverage of syn- 
chrotron emission, free-free emission and thermal dust. 



3.1 Amplitude ranges at 100 GHZ 



By combining the COBE DMR maps with the DIRBE 
thermal dust emission map at 140/zm, Kogut et al. 
1 19961 ) found evidence for an anomalous component 
with a rising spectrum from 53 to 31.5 GHz. The na- 
ture of this com ponen t has been widely discussed since. 
Hinshaw et al. (j2007l ) give a detailed overview of the 



We have normalized the component spectra at 100 GHz. 
The range of fluxes at this frequency has been deter- 
mined from the WMAP 3yr MEM maps for the three 
galactic components. The sky coverage as a function of 
the W-band (93 GHz) flux limit can be seen in Fig. [TJ 
We have chosen a sky coverage close to the WMAP KpO 
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mask, giving about 80 per cent coverage. The applied 
amplitude ranges are given in Table [H 

3.2 Ranges for the spectral indices 

As discussed above, it is expected that the synchrotron 
emission will not follow a simple power law with a con- 
stant spectral index, but the spectrum will steepen at 
higher frequencies. To take this into account in our sim- 
ulations we have used a steepening of —0.1 at 44 GHz 
and —0.20 for the higher frequencies. The range of 
spectral indices (between 33 and 44 GHz) used in our 
simulations is given in Table [T] The variation in the 
spectral index is similar to the Tegmark et al. ( 2000l ) 
PESS synchrotron model. 

The variations in the spectral index of the free-free 
emission arc expected to be small, and traditi onally it 
is assumed to be constant (e.g. Eriksen et al. ( 20061 )). 
In our simulations we have assumed a range between 
—0.2 and —0.1, similar to the r ange used in the PESS 
model by Tegmark et al. (|2000l ). 

To simplify the model of the ther mal d ust emis- 
sion we have, following Eriksen et al. (l2006l) . used the 
single component FDS model 3, with a temperature 
Ti = 18. IK. So we have 



plotfiras HIGH + LOW lim weight = 5 Thu Sep 06 12:37:04 2007 
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where aa is the dust spectral index and vq^^ = 
3000 GHz. 

In Fig. FIRAS spectra, taken from lambda at 
www.gsfc.nasa.gov, with weights > 5 have been used, 
smoothed with a median filter (n = 10), then nor- 
malised at 850 GHz. It is seen that the range of spectral 
indices given in Table [1] covers the dynamic range of 
the spectra well. The range is comparable to the PESS 
model of Tegmark et al. (2000). 

The ranges of the galactic foreground models applied 
in the simulations are illustrated in Fig. [31 



Table 1 Models parameters used for the neural network. 
The range of flux amplitudes at 100 GHz for the galactic 
components are given in units of erg /cm? / s/ Hz/ sr, 

while the unit of the CMB thermodynamic temperature is 



Component 


Amp 


range 


Spec index 


range 


synchrotron 


: 


1.0 


-1.2 : - 


0.6 


free-free 


: 


2.0 


-0.2 : - 


-0.1 


thermal dust 


: 


8.0 


+2.75 : 4 


-3.75 



CMB temp -750:750 



o 0.0100 =- 




frequency [GHz] 

Fig. 2 Good quality FIRAS spectra (weight > 5), median 
filtered (N = 10) and FDS model 3 with spectral indices 
2.75, 3.25 and 3.75. The excesses seen at low frequencies for 
some of the spectra are due to synchrotron and/or free- free 
emission 
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Fig. 3 The ranges of the galactic foregrounds covered by 
the simulations using the parameters from Table [T] The 
spectrum of a black body with &T — 0.75 mK is shown as a 
dashed-dotted line 



4 The ESA CMB mission: Planck 

The main scientific goal of the next ESA medium class 
mission, Planck, is to measure the CMB sky with im- 
precedented sensitivity and angular resolution. One of 
the main drivers of the design of the payload has been 
to assure proper removal of non-CMB footprints from 
the maps. Planck contains 2 detector systems: the 
Low Frequency Instrument (LFI) based on HEMT tech- 
nology (Principal Investigator N. Mandolesi), and the 
High Frequency Instrument (HFI) based on bolometers 
(Principal Investigator J.L.Puget). The reflector sys- 
tem is provided by ESA and a Danish consortium (Prin- 
cipal Investigator H. U. N0rgaard-Nielsen). LFI covers 
27-77 GHz, while HFI covers 67-1000 GHz, much wider 
than the CMB peak around 200 GHz. AU LFI detec- 
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Table 2 Summary of the Planck instrument characterization for a sky pixel with average exposure time, taken from 
the 'Blue Book', ESA-SCI(2005)1. The la flux sensitivity is derived for a common beam with a 30 ' diameter (unit: 





LFf 


LFI 


LFI 


HFI 


HFI 


HFI 


HFI 


HFI 


HFI 


Center frequency [GHz] 


30 


44 


70 


100 


143 


217 


353 


545 


857 


Bandwidth {/S.v/v) 


0.2 


0.2 


0.2 


0.33 


0.33 


0.33 


0.33 


0.33 


0.33 


Angular resolution [arcmin] 


33 


24 


14 


10 


7 


5 


5 


5 


5 


IfT sensitivity 


0.22 


0.38 


0.90 


0.58 


0.61 


1.19 


2.25 


4.34 


5.22 


Icr sensitivity [/iK] 


6.76 


6.71 


6.77 


2.43 


1.61 


2.46 


7.59 


76.09 


3644.24 



tors are polarization-sensitive, while HFI has a number 
of polarization-sensitive bolometers for all frequencies 
less than 545 GHz. In this paper we examine the com- 
ponent separation problem for the Planck temperature 
measurements alone. The expected sensitivity of the 
Planck detector systems is given in Tabled Due to the 
arrangement of the Planck detectors in the focal plane, 
and the expected scanning strategy, it is reasonable to 
assume that the la values for the different frequencies 
will scale with the same factor, depending on the expo- 
sure time. 




Yn 



5 The Internal Linear Combination (ILC) 
method 

The ILC method was introduced by the WMAP team i n 
their analysis of the 1 year data, Bennett et al. (j2003bl ) . 
The basic assumption is that with a linear combination 
of the 5 WMAP frequency maps the foreground signals 
can be eliminated to a large extent and the cosmological 
signal extracted. With the rather complex non-linear 
functionalities of the foreground signals, this assump- 
tion will only make sense on rather small scales. The 
WMAP team divided the sky into 12 regions, of which 
11 cover the galactic plane (see Hinshaw et al. (|2007l ) 
Fig 8). Due to the basic limitations of the method, the 
WMAP team warned against using the ILC map for 
cosmological investigations. 

Eriksen et al. (|2004il have re-analysed the WMAP 
ILC map and improved the variance of the map signif- 
icantly by introducing a Lagrange multiplier fitting al- 
gorithm. Using detailed Monte Carlo simulations, they 
investigated the limitations of the method and empha- 
sized that great care needs to be taken both in its im- 
plementation and in understanding the influence of the 
residual foregrounds on the cosmological results. 



6 Brief description of the neural network 
concept 

Neural networks and other 'machine learning' methods 
are not commonly used in astrophysical investigations. 



Fig. 4 A sketch of a MultiLayer Perceptron network with 
1 hidden layer 



Recently, a few applications have focused on fast cosmo- 
logical paramete r estim ations from C MB p ower spec- 
tra (Auld et al. (|2007h . Fendt et al. (|2007l ). Habib et 
al. (I2OO7II I. Here we are investigating the feasibility 
of neural networks to obtain systematic-free CMB tem- 
perature estimates from noisy millimeter/submillimeter 
sky maps. 

Neural networks are analog computational systems 
whose structure is inspired by studies of the human 
brain. Many different architectures of neural network 
have been developed to tackle a variety of problems. An 
excellent i ntrodu ction to neural networks can be found 
in Bishop ( 19951 ). In the current paper we have only in- 



vestigated one of the most simple and also most popular 
networks, namely the multilayer perceptron (MLP). 

An MLP consists of a network of units (called neu- 
rons) as illustrated in Fig.Hl Each unit is shown as a cir- 
cle and the lines connecting them are known as weights. 
The network can be understood as an analytical map- 
ping between a set of input variables Xm {"m = 1, M) 
and a set of output variables ?/„ (n = 1, A^). The in- 
put variables are applied to the M input units on the left 
of the figure: M = 4 and N = 2 in the shown example. 
These variables are multiplied by a matrix of parame- 
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ters wim [l = 1, (™ ~ 1, M) corresponding to 

the first layer of links. Here L is tlic number of neurons 
in the middle (hidden) layer: L = 3 in the shown ex- 
ample. This results in a vector of inputs to the units in 
the hidden layer. Each component of this vector is then 
transformed by a non-linear function F, so we have 



zi 



/ M \ 
F ^ WlmXm + Bj 



(7) 



where 8; is an offset (or threshold). We used the Neural 
Network Toolbox in the MATLAB software environ- 
ment {www.mathworks.com) and have exploited the 
tansig function as the non-linear function: 



tansig(x) 



1 + exp(-2 x) 



-1, 



(8) 



It is seen that tansig has an S-like shape, with values 
falling within the interval [—1 : 1]. From the hidden 
layer to the output units a linear transformation with 
weights Wni (n ~ l,...,N;l = 1,...,L) and offsets 0„ 
are applied 



L 

Vn = J2 
1=1 



WnlZl 



(n = l,...,7V), 



(9) 



By combining Eqs. 1 and 2, it is seen that the entire 
network transforms the inputs Xm to the output yn by 
the following analytical function 



M 



yn{xi,...,XM) = ^ ^ \ ^ '^im.Xm. + B/J -I- e„, 

(10) 



1=1 



Clearly, such an MLP can easily be generalized to more 
than one hidden layer. 

Hornik, Stinchcombe & White (jl989r ) have shown 



that an MLP with a single hidden layer can approx- 
imate, with arbitrary accuracy, any non-linear multi- 
variate mapping, subject to only mild restrictions, 
provided the number of processing elements is large 
enough. Unfortunately, there is no general scheme for 
defining the optimal network, nor a guarantee that the 
network will converge within a reasonable timeframe. 

Given a set of P sample input and output vector 
pairs, {xP^ y^} p — 1,...,P, for a specific mapping, a 
technique known as error back propagation, can derive 
estimates of the parameters wim, ©m and Wnh ©n, 
so that the network function (jlOp will approximate the 
required mapping. The training algorithm minimizes 
the error function 



N 



enet = EE[y»(^'') 



(11) 



The general concept for setting up a neural network is 
first to provide a test data set. This is traditionally 
split into three data sets: one set used directly to train 
the network; a validation data set used in the iteration 
scheme, not directly in the training, but in the eval- 
uation of the improvement of the network; and a test 
set which is only used at the end of the training to get 
an independent estimate of the accuracy of the derived 
network. 

Unfortunately, an MLP has a tendency, if allowed 
to iterate too many times, to reproduce accidental fea- 
tures in the test data, and therefore not provide optimal 
results when applied on completely independent data 
sets. Of course, you can follow how the iteration of the 
network is proceeding and stop it whenever it seems 
to fit the details of the test sample too closely. How- 
ever, it is desirable to assure a good generalisation (also 
called regularization) of the network in an automated, 
unbiased, way. 

In MATLAB, an appro ach following the Bayesian 
framework, MacKay ( 19921 ). has been implemented in 
the trainbr procedure. The weights and offsets are 
assumed to be random variables with specified distri- 
butions. The regularisation parameters are related to 
the unknown variances associated with these distribu- 
tions. In trainbr the regulation part is combined with 
a training scheme following the Levenberg-Marquardt 
non-linear least-squares method. An advantage of this 
procedure is that it provides a measure of how many 
parameters (weights and biases) are effectively used by 
the network, thus helping in setting up a reasonably 
sized network. 

Normally, neural networks are initiated by choosing 
completely random weights. Ngyen and Widrow (1990) 
have shown that by choosing weights and bias values for 
each layer, so that the active regions of its neurons are 
distributed approximately evenly over the input space, 
the network will be better able to form an approxima- 
tion of any arbitrary function. The advantage of this 
method is that the network in general converges much 
faster. In MATLAB this method is implemented in 
initnw. 



7 The applied neural network 

Since Planck is scanning the sky in a rather inhomoge- 
neous way, it is expected that the obs ervati onal errors 
will vary across the sky. Bernard et al. ( 2002[ ) have sim- 
ulated the exposure times for each sky pixel (10' x 10') 
for different assumptions about the scanning strategy. 
They found for all scanning models considered by the 
Planck Science Team that the exposure times vary by 
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Fig. 5 The standard deviation of tiie ST determinations 
of a [12,3] MLP network (errors = 0.5 * Planck values) as 
a function of the error level of the test data (errors = f * 
Planck values, f = 0.25-2.0). Each test data set consists of 
100,000 spectra, with parameters in the ranges specified in 
Table [T] It is seen that the relation closely follows a straight 
line (dashed) 

Fig. 6 Errors in the ST determinations for a set of 100,000 
independent simulated spectra (/sen = 1.5) run through the 
reference [12,3] network (/sen = 0.5) as function of the syn- 
chrotron constant As and spectral index as . The correla- 
tion functions between the ST residuals and the 2 parame- 
ters are —0.0125 and 0.0427, respectively. A median filtering 
(N=100) of the residuals is given, and the extent of the 65 
percent, 95 percent and 99 percent limits of the error distri- 
bution, derived in 10 sub-intervals, are shown as the solid 
horizontal lines, all displaced — 40^K for clarity. (Figure 
can be found at ftp://ftp.spacecenter.dk^ oM6//tMnn/as^ro — 
fig6 — 9.zip) 

only a factor of about 10 over the sky. Due to the de- 
sign of the Planck focal plane it is reasonable to assume 
that sensitivities in the different frequency bands will 
vary with the same factor. In Table[5]we have given the 
expected observation sensitivity per frequency band for 
an average 30' sky pixel. So it is expected that these 
sensitivities will vary by the same factor, fsem hi the 
range 0.6-1.6 across the sky. 

If symmetrical Gaussian beams and white noise are 
assumed, e.g. no 1// noise contribution, the noise of the 
individual sky pixels is independent, and it is possible to 
treat each sky pixel separately. These assumptions were 
also made in the Planck Working Group 2 Component 
Separation Challenge, see below. 

Under these conditions, and assuming that the three 
foreground components can be treated as discussed in 
Section 2, with the parameter ranges specified in Ta- 
ble [l] it is straight forward to set up a suitable neural 
network. 



Fig. 7 Same residuals and network as Fig |S] but the 
residuals are plotted as a function of the free-free con- 
stant Aff and spectral index a//. The correlation func- 
tions between the ST residuals and the 2 parameters 
are —0.0063 and —0.0065. (Figure can be found at 
|ftp://ftp.spacecenter.dk/pub/hunn/astro-fig6-9.zip[ ) 

Fig. 8 Same residuals and network as Fig |6] but the 
residuals are plotted as a function of the thermal dust 
constant Ad and spectral index q^. The correlation 
functions between the ST residuals and the 2 parame- 
ters are 0.0025 and 0.0158. (Figure can be found at 
ftp://ftp.spacecenter.dk/pub/hunn/astro-fig6-9.zip I 



Fig. 9 Same residuals and network as Fig|6]but the residu- 
als are plotted as a function of the input CMB temperature 
ST. The correlation function is 0.0003. (Figure can be found 
at , ftp://ftp.spacecenter.dk/pub/hunn/astro-fig6-9.zipJ 

In this paper only simple MLP networks have been 
investigated to examine their ability to remove fore- 
ground signals from CMB data. 

We have set up networks to establish an algorithm 
for deriving the CMB temperature anisotropies from 
simulated spectra, using all 7 foreground parameters 
and including noise, covering the 9 Planck frequencies 
such that [o6si, i = 1, 9] ST. 

To assure the feasibility of the network, it is, of 
course, important to assure that the expected param- 
eter ranges are covered, (see Table [T|), and that the 
different combinations of parameters are covered to a 
reasonable extent. Experience has shown that a train- 
ing data set of about 10,000 spectra is sufficient. 

To set up a neural network, the number of hidden 
layers, the number of neurons for each layer, the scale 
factor on the observational errors, fsen, and the num- 
ber of data in the training set, Ntrain, needs to be 
specified. For a running validation of the development 
of the network iteration process, a data set only for this 
purpose is also calculated with Nval spectra, normally 
25 percent of Ntrain- 

The final network is then tested with a data set with 
Ntest spectra (/sen could be different than that for 
the network) derived completely independently of the 
2 data sets used to train the network. 

With these parameters fixed, the following procedure 
is used: 

1. Draw 7 parameter values, uniformly distributed 
within the parameter ranges given in Table [T] 

2. Calculate the combined fluxes of the 4 foreground 
components at the 9 Planck frequencies. 

3. For each frequency add a Gaussian randomly dis- 
tributed number, multiplied by Icr values given in 
Table [5] and the assumed scaling factor fsen ■ 
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4. Repeat 1-3 until the desired number. Ntrain, of 
spectra in the training set have been obtained. 

5. Repeat 1-3 until the desired number, Nyal, of spec- 
tra in the validation set have been obtained. 

6. Train the neural network to find the weights describ- 
ing the mapping between the input spectra and the 
true temperature anisotropy, known for the simu- 
lated input. 

7. Obtain independent test samples of spectra by re- 
peating steps 1-3 Ntest times with different fsen- 

8. Run the Attest spectra through the network to get 
an independent estimate of the reliability of the net- 
work. 

The basic neural network used in this paper is an 
MLP with 2 hidden layers, with 12 and 3 neurons, re- 
spectively (referred to as a [12,3] network). The train- 
ing set contained 10,000 spectra and the sensitivity scal- 
ing factor, fsem was 0.5. To set up this kind of network, 
the requirements on computer power are quite modest. 
The training of the reference network took a total of 
about 15 CPU min. on a Sun Fire V40z (8 AMD Op- 
tron @ 2.2 GHz, 16 GB RAM). 

To represent the range of sensitivities expected for 
Planck we have run several test data sets through the 
network, each with 100,000 spectra and with fsen vary- 
ing from 0.25 to 2.0, covering the sensitivity range rel- 
evant for Planck. Fig. [5] shows the rms of the CMB 
temperature estimates as a function of fsen- It is seen 
that the relation follows to high accuracy a simple scal- 
ing over the full range of applied errors. 

To demonstrate the level of systematic errors in the 
temperature estimates. Figs. El [71 [8] and [9] show the re- 
sults obtained by the reference neural network and the 
test data set with /sen = 1.5. The figures show the dif- 
ference between input CMB temperature and the tem- 
perature derived by the network plotted as a function 
of the 7 basic parameters in our sky model. It is seen 
that the distribution of 6T - 5r(NNET) is very close 
to Gaussian (skewness ~ 0.0133, kurtosis = 0.0129). 
In each figure the correlation between the 6T residu- 
als and the respective input parameter is also given: 
corr(x,y) =< (x- < x >)(y- < y >) > /(a^ay). 

In the figures, the median (N = 100) filtered, re- 
ordered, residuals are plotted together with the extent 
of the 65 percent, 95 percent and 99 percent limits of the 
error distribution, obtained for 10 subintervals, shown 
as the horizontal lines, all shifted — 40^K for clarity. 

It can be seen that errors of the CMB temperatures 
derived by the network arc to a high degree uncorre- 
lated with any of the basic input parameters. It can 
also be seen that the scatter of the ST determinations is 
independent of the amplitudes of the foreground com- 
ponents. This is in strong contrast to direct fitting. 



e.g. Levenbcrg-Marquardt, where the errors increase 
rapidly when the input amplitudes get close to zero. 

8 Discussion 

Brandt et al. (1994) simulated 10° x 10° patches on the 
sky with an angular resolution of 1° for various choices 
of observing frequencies. They assumed simple power 
laws for the synchrotron and free-free emissions, while 
the dust emission was assumed to be a combination of 
two black body spectra, (Ti = 20.4K and T2 = 4.77K), 
with an emissivity power law having a variable index 
and a variable scaling factor. They fitted the simulated 
spectra for each sky pixel by means of a Levenbcrg- 
Marquardt algorithm. They emphasized that this di- 
rect fitting is unstable due to the fact that the number 
of parameters is only slightly smaller than the number 
of observed frequencies, which is actually the case for 
all CMB experiments. 

It is especially difficult to fit the synchrotron and 
free-free emissions simultaneously. This problem is nor- 
mally smoothed over by combining the two components 
into a single power law in the fitting routine (Brandt 
et al. 1994; Linden- V0rnle & N0rgaard-Nielsen 1998). 
Clearly, with this simplification systematic errors will 
be introduced at some level in the CMB temperature 
estimates. One way to handle this problem is to assume 
a constant spectral index for the free-free emission e.g. 
Erikscn et al. (2006). 

Eriksen et al. (2006) elaborated further on the 
Brandt et al. approach. They assumed a simple power 
law for the synchrotron emission and a power law with 
a constant spectral index for the free-free emission. 
The thermal dust emission was assumed to follow FDS 
model 8, but in the fitting routine they applied the 
much simpler FDS model 3. They set up a Markov 
Chain Monte Carlo (MCMC) algorithm to fit the spec- 
trum of each individual sky pixel. They estimate that 
the fitting for each pixel will take about 100 seconds, 
implying that it is unfeasible to fit the spectra of all 
50 million Planck sky pixels. Therefore, Eriksen et al. 
relaxed the angular resolution of the data to be fitted 
with the full MCMC algorithm. In their simulations 
they smoothed the original simulated maps (resolution 
~ 1 deg) with a Gaussian (FWHM 6 deg). These low- 
resolution data were then run through the MCMC al- 
gorithm. For the high resolution data they used the 
low-resolution non-linear parameters (power law spec- 
tral indices) and fitted only the linear parameters (the 
scaling factors). 

Eriksen et al. give details of their MCMC fit only 
for a single pixel, in the galactic plane (1 = 58° and b 
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= 0°). They emphasize that the parameters fomrd for 
this pixel are clearly correlated (see their Figs. 3 and 4). 
The MCMC algorithm rejected the fit for this pixel at 
a high confidence level, mainly because the spectrum is 
fitted by an FDS model 3 and not an FDS model 8 spec- 
trum like the input spectrum. Eriksen et al. give the 
residual maps for the CMB temperature and for each of 
the component parameters, both for the low and high 
resolution data. The problem with FDS model 3, ver- 
sus model 8, is clearly evident at high latitudes in their 
goodness-of-fit plot (Eriksen et al. Fig. 6). They fixed 
the problem at high latitudes merely by introducing, 
by hand, a constant dust spectral index. Eriksen et al. 
found that the distribution of CMB temperature resid- 
uals fitted reasonably well to the rms errors estimated 
by the MCMC algorithm. 

Eriksen et al. do not provide information about the 
correlation between the errors in their CMB tempera- 
ture estimates and the basic paramaters of their MCMC 
fitting scheme. 

Hansen et al. (2006) have suggested that instead of 
removing the galactic foregrounds by means of a linear 
combination of observations obtained at different fre- 
quencies (i.e. ILC), a linear combination of differences 
between different frequencies should be used. Hansen 
et al. assume that the frequency spectra of the fore- 
ground components are independent of position on the 
sky. By analysing local regions separately this assump- 
tion can be relaxed. As emphasized by Hansen et al., 
this method increases the noise level in the data, imply- 
ing that it is mainly useful for studies at larger scales 
where noise is not dominant. 

The blind methods for component separation of 
CMB data are, of course, not totally blind. These meth- 
ods model the total sky signal as a linear mixture of a 
few independent emission processes. The observation of 
the sky with a detector d is then a noisy linear mixture 
of Nc components: 



to change across the sky (e.g. the coefficients for the 
ILC fit of the WMAP 3 yr data, Hinshaw et al. 2007). 
With the improved sensitivity of Planck, it is reasonable 
to assume that even larger differences will be found. A 
way out of this problem is to isolate areas of the sky 
where assumption 1 is reasonable (Delabrouille et al. 
2003). 

Since all 3 galactic components are strongly concen- 
trated towards the galactic plane, they are to some ex- 
tent correlated. At higher latitudes the situation is un- 
clear, but a detailed analysis of the correlations between 
the components will be one of the scientific tasks for 
Planck, so some interesting science could be missed by 
keeping assumption 2. 

With the sophisticated and complex nature of Planck 
detector systems, combined with the weakness of the 
CMB anistropy signal, it will not be a surprise if the 
final sensitivity of the 9 frequency maps will first be 
achieved after very detailed investigation of all possible 
systematic errors (e.g. residual 1/f noise and tempera- 
ture variations). It will also probably be too optimistic 
to expect that all the systematic errors will have been 
removed from the data. 

Due to the importance of establishing a reliable pro- 
cedure for removal for obtaining a foreground - free 
CMB map, Planck Working Group 2 has produced de- 
tailed maps with reasonable observational errors called 
the Planck Sky Model (PSM). As emphasized above, 
these maps are produced assuming symmetrical Gaus- 
sian beams and white observational noise, as in our 
neural networks. The maps have been run through 8 
different separation algorithms and the prel imina ry re- 
sults have been discussed by Leach et al. (|2008l ). As 
seen in their Fig. 5, on a colour scale of ±30/xK, small 
systematic errors are present in the residual maps for 
all 8 algorithms. In a forthcoming paper we will analyse 
these PSM data, both for the simulated Planck and the 
WMAP maps by means of dedicated neural networks. 



TV, 

yd(0,0) = Adjs,ie,c^) + ridie,^) (12) 
i=i 

where Sj is the emission template for the jth astrophys- 
ical process. The coefiicients Adj reflect the emission 
laws and detector properties, while Ud accounts for the 
noise. As emphasized by Delabrouille et al. (2003) these 
methods have 3 basic assumptions: 

1) The mixing matrix, A^j, is position-independent. 

2) The components are statistically independent. 

3) Noise is uncorrelated between the detectors. 

In the real world it will be difficult to fulfill these 
requirements. The spectral indices and relative contri- 
butions of the different galactic components are known 



9 Conclusions 

In this paper we investigate mainly the concept of using 
simple neural networks (MLPs) to remove the galactic 
foregrounds from the CMB signal, making some sim- 
plifying assumptions: only 3 galactic components are 
taken into account (synchrotron, free-free and thermal 
dust emission); the spectra of the foregrounds follow 
simple modifications to power laws; the observational 
noise is white (Gaussian) ; Gaussian beams are assumed. 
Under these assumption it has been demonstrated that 
for more than 80 per cent of the sky a neural network 
can provide reliable estimates of the CMB temperature 



N0rgaard-Nielsen and j0rgensen: Cleaning of CMB maps by a neural network 



11 



with errors, which to a high degree are uncorrelated to 
the basic parameters of the sky model used. 

The spectra of the foregrounds in the Planck fre- 
quency range, close to the galactic plane, are at present 
known to be complex, but little is known about the 
complexity at high latitudes. Therefore, in the case of 
Planck the details of the modeling of the foregrounds 
will first be established when the data has been care- 
fully reduced and analysed. This will, no doubt, result 
in the need for including more galactic components (e.g. 
spinning dust) and, most likely, a revision of the spec- 
tral behavior of the components will be required. From 
the available knowledge of the Planck detectors, it is 
clear that the noise will not be white, and it will be 
a challenging task to investigate how to minimize the 
influence of non- white noise features. It could well be 
that more sophisticated neural networks than MLP's 
will be needed in the Planck pipeline. 

To further test the capabilities of neural networks to 
clean CMB temperature maps, we will, in a forthcoming 
paper, analyse the Planck Sky Model simulations of 
both Planck and WMAP data, as well as some real 
data, the recently released WMAP 5yr data. 
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